Master equation approach to protein folding and kinetic traps 
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The master equation for 12-monomer lattice heteropolymers is solved numerically and the time 
evolution of the occupancy of the native state is determined. At low temperatures, the median 
folding time follows the Arrhenius law and is governed by the longest relaxation time. For good 
folders, significant kinetic traps appear in the folding funnel whereas for bad folders, the traps also 
occur in non-native energy valleys. 

PACS numbers: 87.15. By, 87.10. +c 



The key problem in protein folding is one of dy- 
namics. Tremendous progress has been made in un- 
derstanding equilibrium properties of simplified lat- 
tice models Jl) . Such studies have demonstrated the 
requirement of thermodynamic stability [0,0 , stabil- 
ity against mutations || and a linkage between rapid 
folding and stability of the native state M (nat). 
Monte Carlo studies of folding have been helpful in 
elucidating the folding funnel ^j^] and meaningful 
relationships with experiment are being established 
||. So far, the approaches to studies of the folding 
dynamics have been restricted to Monte Carlo simu- 
lations that start from a few randomly chosen initial 
conformations |tJ and the enumeration of transition 
rates between classes of conformations which have 
the same number of contacts and are a given num- 
ber of kinetic steps away from the NAT ||. The 
approximations involved in these approaches remain 
necessarily untested. 

In this letter, we present an exact method to study 
the dynamics of short model proteins based on the 
master equation B. To illustrate the method, we 
present results for sequences made of 12 monomers 
and placed on a square lattice. We focus on two se- 
quences, A and B, which have good and bad folding 
properties respectively. We find that the dynamics 
of A and B are superficially similar: for both, the 
median folding time, £foid) and the longest relaxation 
time, Ti diverge at low T according to an Arrhenius 
law. However, what distinguishes the two cases is 
the location of the folding transition temperature, 
Tf, with respect to the temperature T m i n at which 
folding to the native state proceeds the fastest. Tf 
is defined as the temperature at which the equilib- 
rium value of the probability to occupy the NAT, P , 



crosses ^ and is a measure of thermodynamic stabil- 
ity. For bad folders, Tf is well below T min and thus 
a substantial occupation probability for the NAT is 
found only in a temperature range in which the dy- 
namics are glassy. 

A deeper understanding of the differences between 
A and B is obtained by the identification of kinetic 
traps through an analysis of the eigenvectors corre- 
sponding to the longest relaxation time. The most 
potent kinetic trap for sequence A is within the fold- 
ing funnel and is a few steps away from the NAT. The 
energy needed to exit the trap determines the bar- 
rier, 5E, in the Arrhenius law, tfoid ~ exp(SE/T). 
For the bad folder, the relevant trap forms its own 
energy valley and exiting it requires full unfolding. 
The are many ways to unfold and the effective 8E 
is entropy influenced - the bottleneck arises from a 
search process. 

Method: Consider a lattice polymer which can 
exist in AT conformations (A/"=15037 for 12-monomer 
sequences). Let P a = P a (t) be the probability of 
finding the sequence in conformation a at time t. 
The master equation is 



dP a 
dt 



EK/3 



a)P p - w(a f3)P a ] (1) 



where w a /3 = w((3 — > a) is the transition rate from 
conformation to conformation a. We bring this 
into a matrix form by letting P — (Pi, . . . , P^f) and 



h a p = -w a p < if a ^ (3 , h aa = 



(2) 



The master equation then takes the form of an 
imaginary-time Schrodinger equation dtP — —HP, 
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where the h a p are the matrix elements of H. While 
this reformulation is standard fic|j , it has regained 
interest recently because H can often be related to 
integrable quantum systems pl| . 

It is well known jl(| that the conditions (||) are 
necessary and sufficient for a matrix {H) a p — h a p 
to give rise to a stochastic Markov process. In par- 
ticular, it follows that if initially < P a < 1 for all 
conformations, this will hold true at all subsequent 
times. Time-dependent averages for any observable 
X are found from 

(X)(t) = (s\Xe~ kt \P m ) (3) 

where X is the matrix representation of X, (s| = 
(I, . . . , 1) is the left steady state of H with (s\H = 
and \Pi n ) = J2 a Pa(0)\a) is the initial state. The 
spectrum of relaxation times r a — 1/Re E a > 
follows directly from the eigenvalues E a of H . 

An important special case arises if the right steady 
state \s) = J2 a -fa q l a ) i s related to a Hamiltonian TL 
through P° q ~ e~ w =/ T . This happens provided the 
detailed balance condition 

w aP Pp = w Pa P^ (4) 

is satisfied and then P° q is indeed a steady-state 
solution of the master equation. Eq. (||) is satisfied 
by w a/3 = fa/3 exp [-(H a - TLp)/2T] provided f a p = 
fp a . Here, we choose w a p = w£l + , where 

-S = ^(l+exp(^^))" (5) 

with Ri + i?2 = 1- Here, a refers to the single- and 
double-monomer moves and To is a microscopic time 
scale. It is understood that w = if there is no 
move of type a linking (3 with a. This choice guar- 
antees that transition rates are finite and bounded 
for all temperatures. In analogy to ref. [^), we focus 
on i?i=0.2 and take the single and two-monomer 
(crankshaft) moves as in ref. B. 

Because of the detailed balance condition, the 
eigenvalues E a are not calculated by diagonalizing 
H directly, but by diagonalizing an auxiliary matrix 
K with elements 

k a/3 = —h a /3 = kp a , (6) 
v a 

where v a — e~ Uc "^ 2T \ The right eigenvectors \E a ) 
oiH are found from the eigenstates K\F a ) = E a \F a ) 
via \E a ) = v a \F a ), but must still be normalized for 



a probabilistic interpretation. It follows that the 
eigenvalues E a are real and positive and that the 
eigenstates span a complete basis jill. The eigen- 
vector corresponding to E a — 0, i.e. to the infi- 
nite relaxation time, determines the equilibrium oc- 
cupancies of the conformations. The longest finite 
relaxation time t± = 1/E\ is found from the small- 
est non-zero eigenvalue E\ . 

The practical calculation of the eigenvalues E a 
and of the lowest two eigenvectors of H is done 
through the standard symmetric Lanczos algorithm 
without reorthogonalization. Since memory require- 
ments are the essential limit of the method, it is 
important that besides the matrix elements, only 
two more vectors have to be kept in memory [ [I2[ . 
To find the eigenvectors, we follow the suggestion 
of Dagotto JH| to run the Lanczos algorithm twice. 
In the first pass, we find the eigenvalues and the 
similarity transformation which diagonalizes the in- 
termediate tridiagonal matrix constructed from K . 
In the second pass, this information is used to accu- 
mulate the eigenvectors from the intermediate vec- 
tors which in this way need not be kept in memory. 
The number of Lanczos iterations needed to achieve 
good convergence varied between 200 and 2500, de- 
pending on the temperature and also the sequence 
considered (for sequence A, the convergence is more 
rapid than for B). The time-dependence state vector 
P{t) at time t = htq can be obtained by applying n 
times the recursion P((n + 1)to) = (1 + H)P(nT ) 
to P in . 

Results: The energies of a sequence are deter- 
mined by the Hamiltonian TL — BijAij, where 
the contact interaction, By , is assigned to monomers 
i and j which are geometrical nearest neighbors 
on the lattice but are not neighbors along the se- 
quence - the condition symbolized by Ay. For 12 
monomers, there are 25 contact energies which we 
pick as Gaussian numbers of unit dispersion and 
with a mean value around -1 to provide an over- 
all attraction |l4| . Sequence B has couplings identi- 
cal in strength to those in A but the assignment to 
monomers is permuted ]T^| . 

The ground state of sequence A is maximally com- 
pact and fills the 3x4 lattice. The probability, that 
it is occupied at time t, Po(t) depends on the ini- 
tial condition. Figure 1 shows the evolution of Po(t) 
from three different initial states. The solid line cor- 
responds to an initial state in which all conforma- 
tions have equal probability of 1/Af of being occu- 
pied. The broken line is for the situation in which 
the system is initially in the NAT. Finally, the dotted 
line corresponds to the initial state being the kinetic 
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"trap" conformation [[L5| which is the strongest ob- 
stacle in reaching equilibrium. 

The trap is determined by studying the eigenvec- 
tor corresponding to the longest relaxation time and 
by identifying the local energy minima which have 
the largest weights at low T. The largest weight is 
associated with the NAT whereas the second largest 
corresponds to the most relevant trap. In the limit 
of T — > 0, weights associated with all other states 
become insignificant. 

Figure 1 shows that the equilibrium value of Pq is 
reached in essentially the same time, independent of 
the initial state because the long time dynamics is 
determined by just one mode with a relaxation time 
T\. The time, ti, needed to reach, say, half of the 
equilibrium value, does depend on the initial state - 
it is significantly longer for the trap state. 

The top of Figure 2 shows the NAT and trap con- 
formations. The latter is 2.5404 energy units above 
the NAT. The overall least costly path (energetically) 
between the trap and the NAT involves at least 10 
steps and requires an increase of 4.5323 above the 
trap energy. The most costly step in this trajectory 
requires an energy of 2.8823 to move monomer 12 
away from monomers 5 and 7. 

Figure 2 summarizes results obtained from the 
master equation, when the initial state is of uniform 
occupancy and compares them to the median fold- 
ing time obtained through Monte Carlo simulations 
which satisfy detailed balance conditions along the 
lines described in ref. [§j. The low temperature be- 
havior of ifoid follows the Arrhenius law, and 5E is 
close to the energy needed to exit the kinetic trap. In 
this region, ifdd is proportional to n. This longest 
relaxation time essentially coincides with ti when 
the kinetic trap is the initial state. 

The Arrhenius behavior sets in fairly close to T min 
where tfdd displays a minimum. On the high tem- 
perature side of T m i n , the characteristic times related 
to the approach to equilibrium no longer have any 
relationship to ifoid and the values of Po are small 
(0.064 at T=1.2). The physical situation changes 
now: reaching the NAT now is controlled by fluctua- 
tions in equilibrium and is governed by the statistics 
of rare events. 

Figure 3 summarizes the dynamical data for se- 
quence B for which Tf is substantially below T m { n 
and signifies bad folding properties. The NAT for B 
is not maximally compact and is doubly degenerate 
as shown at the top of Figure 3. The two states dif- 
fer merely by placement of one monomer and, when 
studying folding, are considered as an effective single 
state. The overall shape of the temperature depen- 



dence of tfoid is similar to that for sequence A and 
the low T Arrhenius law is also obeyed. The ki- 
netic trap state, also shown in Figure 3, is very close 
in shape to the NAT and it differs from it only by 
one contact. This state, however, is very remote ki- 
netically: all trajectories which lead from the trap 
to the NAT must go through an unfolded state and 
take at least 31 steps with the biggest single step 
energy increase of 2.7478. This trap is not in the 
folding funnel of the NAT - the energy landscape is 
thus very rugged. The 8E of the Arrhenius law is 
close to 3.55 and is expected to have a substantial 
entropy contribution at non-zero temperatures due 
to many possible choices of the trajectories. 

The method presented in this letter offers ways of 
studying kinetic traps systematically. In particular, 
existing truncation techniques for the diagonaliza- 
tion of large matrices might be fruitfully employed 
in extending the method to longer protein chains. 
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FIGURE CAPTIONS 

1. Probability of occupation of the native state, 

Po(t), of sequence A, for three values of the 
temperatures, indicated on the right. Pq(oo) 
agrees with the equilibrium value. For se- 
quence B, the values of Po(oo) are significantly 
lower than that for sequence A - for example, 
at T=0.6, P (oo)=0.0752. The circles corre- 
spond to Monte Carlo results, based on 200 
random starting conformations. 

2. Top: The NAT and 'trap' conformations and their 

energies for sequence A. The enlarged circle 
shows the first monomer. Main: Dynamical 
data for the folding. The solid line marked by 
tfoid gives the median folding time derived from 
1000 Monte Carlo trajectories. The solid line 
T\ is the longest relaxation time. The dotted 
line t eq is the time to reach equilibrium from 
the initial state of uniform occpancy. The bro- 
ken line 1 1 with the black circles gives the time 

to reach ^Pq from this initial state. The open 
circles indicate the same but the trap is taken 
as initial state. The dotted line tA is a fit of the 
Monte Carlo data to the Arrhenius law with 
5E=2.76. The arrow at the top indicates the 
value of the folding temperature. 

3. Same as Figure 2, but for sequence B. For the 

curve tA, 5E=3.55. There are two NAT confor- 
mations with the same energy. 
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